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In  the  present  article,  we  present  a  means  to  remotely  and  transparently  estimate  an  individual's  level  of 
fatigue  by  quantifying  changes  in  his  or  her  voice  characteristics.  Using  Voice  analysis  to  estimate  fatigue  is 
unique  from  established  cognitive  measures  in  a  number  of  ways:  (1)  speaking  is  a  natural  activity  requiring 
no  initial  training  or  learning  curve,  (2)  voice  recording  is  a  unobtrusive  operation  allowing  the  speakers  to  go 
about  their  normal  work  activities,  (3)  using  telecommunication  infrastructure  (radio,  telephone,  etc.)  a  diffuse 
set  of  remote  populations  can  be  monitored  at  a  central  location,  and  (4)  often,  previously  recorded  voice  data 
are  available  for  post  hoc  analysis.  By  quantifying  changes  in  the  mathematical  coefficients  that  describe  the 
human  speech  production  process,  we  were  able  to  demonstrate  that  for  speech  sounds  requiring  a  large  average 
airflow,  a  speaker's  voice  changes  in  synchrony  with  both  direct  measures  of  fatigue  and  with  changes  predicted 
by  the  length  of  time  awake. 


The  unique  characteristics  of  the  military  and  aviation 
environments  make  war  fighters,  pilots,  and  air  traffic 
control  personnel  particularly  susceptible  to  fatigue.  En¬ 
vironmental  factors  such  as  movement  restriction,  poor 
airflow,  low  light  levels,  background  noise,  and  vibration 
are  known  to  cause  fatigue  (M  ohler,  1996).  In  addition, 
the  introduction  of  advanced  automation  has  changed  the 
nature  of  thejob  forthese  individuals.  "Hands-on"  activi¬ 
ties  have  been  replaced  by  greater  demands  on  the  crew 
to  perform  vigilant  monitoring  of  automated  systems;  a 
task  that  people  find  tiring  if  performed  for  long  periods 
of  time  (Colquhoun,  1976).  Personnel  operating  at  unac¬ 
ceptable  levels  of  cognitive  performance  present  a  danger 
to  their  mission,  to  themselves,  and  to  their  work  team. 

An  analysis  of  NASA’s  Aviation  Safety  Reporting 
System  (ASRS)  revealed  that  3.8%  of  air  transport  crew 
member  error  reports  were  directly  associated  with  fa¬ 
tigue  (Lyman  &  Orlady,  1981).  However,  when  factors 
related  to  fatigue  are  considered,  such  as  inattention  or 
miscommunication,  the  number  increases  to  21.1%.  Fa¬ 
tigue  also  results  in  an  increase  in  what  a  person  might 
consider  acceptable  risk  in  an  attempt  to  avoid  additional 


effort  (Barth,  Holding,  &  Stamford,  1976;  Shingledecker 
&  Holding,  1974). 

The  ability  to  quickly  and  unobtrusively  monitor  an 
airman’s  or  soldier’s  level  of  alertness  prior  to  and  during 
the  undertaking  of  mission-critical  activity  would  provide 
commanders  with  critical  information  regarding  person¬ 
nel  assignments,  quite  possibly  save  lives,  and  increase 
the  likelihood  of  mission  success.  U  nfortunately,  there  are 
no  cognitive  assessment  tests  that  have  been  proven  to  be 
effective  in  the  field  under  conditions  of  high  stress  and 
severely  limited  time. 

In  this  article  we  will  describe  and  evaluate  the  applica¬ 
tion  of  a  speech-based  approach  to  estimating  a  speaker’s 
level  of  fatigue.  U  sing  the  voice  characteristic  metrics  that 
are  necessary  in  the  implementation  of  most  automatic 
speech  recognition  (A  SR)  software  algorithms,  we  quan¬ 
tify  the  change  in  speakers'  voice  quality  as  they  become 
fatigued.  These  changes  are  compared  to  widely  accepted 
empirical  and  model-based  measures  of  fatigue.  The  next 
section  discusses  the  physiology  of  speech  production  as 
an  introduction  to  our  approach  and  includes  previous  at¬ 
tempts  to  relate  voice  characteristics  to  fatigue. 
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Human  Speech  Production  and  Its  Association 
With  Fatigue 

Recognizable  speech  is  produced  by  a  continuous  ad¬ 
justment  of  the  resonating  characteristics  of  the  vocal 
tract.  This  system  consists  of  an  excitation  region  (lungs, 
diaphragm,  and  vocal  folds)  and  a  filter  that  is  adjusted  by 
changes  in  the  position  of  the  pharynx,  tongue,  lips,  jaw, 
and  soft  palate. 

The  production  of  speech  sounds  is  a  process  that  re¬ 
lies  on  precise  interactions  between  the  sensory  and  motor 
systems.  Control  of  the  voice  articulators  is  done  through 
a  biofeedback  process  involving  the  sensing  and  monitor¬ 
ing  of  the  vibration  of  the  vocal  folds  through  the  sound 
and  feeling  that  they  create.  With  increasing  fatigue  or 
alcohol-induced  impairment,  this  system  is  disrupted. 
Different  speech-based  manifestations  of  this  disruption 
have  been  reported  by  a  number  of  researchers.  Previous 
work  associating  changes  in  voice  with  fatigue  has  gener¬ 
ally  focused  on  discrete  characteristics  of  the  speaker’s 
voice. These  include,  pitch  and  word  duration  (Whitmore 
&  Fisher,  1996)  and  the  timing  between  articulated  sounds 
(Vollrath,  1994).  Changes  in  voice  spectral  parameters 
have  been  associated  with  alcohol-related  impairment 
(Brenner&  Cash,  1991)  and  hypoxia  (Saito  etal.,  1980). 
The  significant  effects  of  circadian  influences  on  voice 
characteristics  have  been  observed  in  a  number  of  studies 
(Roth  etal.,  1989;  Whitmore  &  Fisher,  1996). 

i  n  our  analytical  procedure,  we  monitored  the  organized 
collection  of  sounds  of  the  I  nternational  Phonetic  A  Iphabet 
(IPA),  a  listing  of  41  phonesfrom  which  all  English  words 
are  comprised,  i  n  this  manner,  identical  words  need  not  be 
present  in  the  recorded  or  "online"  vocalizations,  and  more 
subtle  changes  may  be  detected  compared  to  whole  word 
analysis.  Quantified  speech  signals  using  metrics  that  are 
representative  of  the  entirety  of  the  speaker's  voice  were 
also  utilized.  This  process  is  described  in  the  next  section. 

Quantification  of  Voice 

M  athematically,  the  speech  signal  consists  of  a  convolu¬ 
tion  of  the  excitation  waveform  with  the  filter  description 
in  the  time  domain  or  by  a  multiplication  of  the  transfer 
functions  of  the  two  regions  in  the  frequency  domain.  It 
is  possible  to  process  the  recorded  speech  signal  [SM]  in 
a  manner  that  will  separate  the  isolated  filtering  effects 
[FM]  from  the  excitation  signal  [EM].  In  this  process, 
the  spectral  characteristics  of  the  speech  signal  are  ob¬ 
tained  and  a  logarithm  of  the  resulting  amplitude  is  cal¬ 
culated.  This  provides  a  computed  measure  from  which 
excitation  and  filter  components  are  separated,  and  can  be 
seen  in  Equation  1. 

log[SM  =  log[EM]  +  log[FMj.  (D 

The  resultant  log  magnitude  spectrum  is  then  trans¬ 
formed  back  to  the  time  domain  using  a  discrete  Fourier 
transform.  This  process  ultimately  results  in  the  forma¬ 
tion  of  a  discrete  (and  manageable)  number  of  coefficients 
(called  cepstrum  coefficients )  that  represents  separate 
filter  and  excitation  signals  in  the  time  domain.  It  is  im¬ 
portant  to  note  that  the  entire  speech  production  process 
is  now  characterized  by  only  these  few  cepstrum  coeffi¬ 


cients.  Isolation  of  the  spectral  coefficients  from  either 
the  excitation  or  filter  sections  is  accomplished  by  the  re¬ 
moval  of  the  irrelevant  cepstrum  coefficients  followed  by 
another  conversion  to  the  frequency  domain. 

From  this  discussion,  the  entire  human  speech  produc¬ 
tion  process  may  be  reduced  and  described  by  a  manage¬ 
able  number  of  coefficients.  Thereby,  instead  of  tracking 
all  of  the  fatigue- related  changes  in  specific  vocal  met¬ 
rics,  such  as  pitch  or  duration,  we  can  track  changes  in 
the  entire  speech  production  system  with  the  analysis  of 
these  coefficients.  Our  developmental  software  calculates 
36  mel-frequency  cepstrum  coefficients  (M  FCCs)  com¬ 
prised  of  12  cepstral  coefficients  (M  FCC  1-M  FCC  12) 
and  their  first  and  second  time  derivatives.  From  this  point 
we  will  refer  to  these  36  components  as  the  voice  vector. 

These  results  were  achieved  by  using  speech  recogni¬ 
tion  software  developed  for  this  project  by  the  I  nstitute  for 
Signal  and  Information  Processing  (I SI P)  at  M  ississippi 
State  University.  As  depicted  in  Figure  1,  this  software 
calculates  12  cepstral  coefficients  along  with  thefirstand 
second  derivatives  (the  Voice  Vector).  This  voice  charac¬ 
terization  capability  was  then  used  to  analyze  the  experi¬ 
mental  data  described  in  the  following  section. 

Figure  2  illustrates  how  the  voice  vector  of  a  speaker 
had  changed  over  a  4-day  period  of  the  sleep  restriction. 
This  example  represents  the  voice  vector  generated  by 
a  single  subject's  utterance  of  the  sound  "t."  The  legend 
identifies  the  correlation  coefficients  for  the  voice  vector 
at  12,  39,  and  78  h  awake  with  thevoicevectorattheonset 
of  testing  (Trial  1  at  12  h  awake). 

EXPERIMENTAL  DATA 

The  data  provided  for  this  study  consisted  of  296  speech 
trials  acquired  from  31  normal  talkers  from  three  separate 
experimental  protocol  test  conditions: 

G  roup  1  (FA A  G  roup) 

Six  nonmedicated  subjects  reciting  31  unrelated  words 
every  6  h  over  a  period  of  34  h  of  sleep  loss. 

As  part  of  a  larger  FA  A  study  that  involved  a  34-h  period 
of  sleep  deprivation  (Nesthus,  Scarborough,  &  Schroeder, 
1998),  subjects  were  asked  to  recite  a  list  of  31  words  dur¬ 
ing  6  test  times  (10:00  a.m.,  4:00  p.m.,  10:00  p.m.,  4:00  a.m., 
10:00  a.m.,  and  4:00  p.m.).  The  test  times  were  selected  to 
represent  circadian  high  and  low  points  in  alertness  and  per¬ 
formance.  Voice  was  recorded  using  a  Digital  Audio  Tape 
(DAT)  recorder  (TA  SC  AM  M  odel  No.  DA -PI)  and  a  hand¬ 
held  microphone.  Also  measured  during  these  test  times 
were  sleep  onset  latencies  (SOL ),  representing  an  objective 
method  for  determining  sleepiness.  This  is  described  in  the 
next  section.  Between  test  sessions,  these  subjects  partici¬ 
pated  in  low  arousal  activities  such  as  reading,  watchingTV, 
and  schoolwork  but  not  allowed  to  sleep. 

Group  2  (Air  Force  Group) 

Nine  medicated  and  eight  placebo  subjects  recited  eight 
fixed  phrases  every  3  h  during  66  h  of  testing.  The  ini¬ 
tial  data  collection  occurred  at  1800  h  on  Day  1.  Given 
a  wakeup  time  of  0600  h,  each  subject  experienced  ap- 
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Mel-Spaced  First  Derivative  of  Second  Derivative  of 

Cepstrum  Cepstrum  Cepstrum 

Figure  l.The  information  generation  process  used  by  the  I  SI  P  software.  Speech  signals 
are  scaled  according  to  human  ear  response  characteristics  using  a  "M  el-scale"  filter  bank. 
C  epstral  analysis  and  two  derivative  processes  provide  36  voiced  sound  coefficients  for  each 
time  frame  of  speech  data. 


proximately  78  h  of  restricted  sleep  by  the  end  of  testing 
(1200  h  on  Day  3). 

The  objective  of  this  effort  was  to  evaluate  the  efficacy 
of  modafinil  forsustaining  alertness  in  personnel  involved 
in  sustained  field  operations.  All  participants  were  males 
between  the  ages  of  18  and  34  years. 


Participants  hiked  approximately  22  miles  during  the 
first  two  days  of  the  field  event  and  then  bivouacked  for 
the  remaining  24  h  of  the  study.  While  traveling  the  route, 
participants  performed  10  min  of  tests  every  3  h.  This 
test-block  consisted  of  several  simple  cognitive  tests  (the 
ARES  test  battery)  performed  on  a  personal  digital  assis- 


MFCC  1-12  MFCC  1-12  MFCC  1-12 


Figure  2.  Changes  in  the  voice  vector  during  four  days  of  sleep  restriction.  Here  we 
illustrate  the  voice  vector  (36  M  FCC  components)  generated  by  a  single  subject's  ut¬ 
terance  of  the  sound  "t."Asquantified  in  the  legend,  the  vector  duringTrialsland  10 
match  better  than  the  vectors  atTrials  1  and  21. 
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tant  (PDA)  palmtop  computer  (Tungsten  T),  a  subjective 
sleepinesscheck  (the  Stanford  Sleepiness  Scale),  afatigue 
q  u  esti  o  n  n ai  re  ( th e  S  u sta i  n ed  0  perati  o n s  A  ssessm en t  P  ro - 
f  He  or  SOAP),  and  a  mood  questionnaire  (Profile  of  M  ood 
States  or  POM  S).  Every  6  h  along  the  route  there  was  a 
checkpoint  during  which  the  normal  10-mi  n  test  block  was 
performed  along  with  additional  tests,  including  voice. 
Voice  was  also  recorded  on  the  PDA  using  that  device's 
voice  memo  capability.  Participants  were  not  allowed  to 
sleep  during  the  first  night  en  route  and  were  only  allowed 
a  2-h  sleep  period  (0000-0200  h)  during  their  second  and 
third  nights  on  the  trail. 

G  roup  3  (C  reare  G  roup) 

Eightsubjects  recited  eightfixed  phrases  every  2  h  dur¬ 
ing  a  normal  workday. 

These  volunteers  were  obtained  from  the  C reare  Inc. 
employee  population.  Starting  with  their  arrival  at  work 
(between  7  and  8  a.m.),  self-administered  tests  were 
conducted  approximately  every  two  h  until  the  end  of 
their  workday  (between  5  and  6  p.m).  Total  time  awake 
for  each  subject  was  calculated  and  recorded  based  upon 
their  reported  wake  up  times  on  the  day  of  testing.  Dur¬ 
ing  these  tests,  the  volunteers  read  a  series  of  phrases  into 
the  same  model  PDA  used  for  G  roup  2.  B  etween  tests  the 
subjects  maintained  normal  work  activities  (office  and 
laboratory). 

The  Speech  Recognition  Software  (SRS)  described 
in  the  previous  section  was  used  to  process  the  recorded 
speech  samples  into  the  36-component  voice  vectors,  as¬ 
sociated  with  the  speech  phones  contained  within  each 
sample.  These  voice  metrics  were  analyzed  to  determine 
the  degree  of  association  between  the  speaker's  measured 
and  estimated  levels  of  fatigue.  Figure  3  illustrates  this 
process. 


Our  initial  analysis  revealed  that  the  individual  M  FCC 
component  most  sensitive  to  fatigue  varied  from  speaker 
to  speaker.  As  such,  comparisons  of  individual  voice  vec¬ 
tor  components  would  not  generalize  across  a  population 
of  speakers.  Recalling  that  speech  recognition  software 
can  reliably  recognize  specific  sounds  spoken  by  a  wide 
range  of  speakers  by  analyzing  the  entire  voice  vector,  a 
correlation  coefficient  was  calculated  between  the  voice 
vector  atTrial  1  and  that  obtained  during  the  trial  of  inter¬ 
est.  We  call  this  metric  the  Voice  Correlation  or  Vc.  In 
Figure  2  we  showed  a  speaker’s  voice  27  waking  hours 
after  an  initial  utterance  (12  h  awake  vs.  39  h  awake)  had 
changed  (V  c  =  0.82);  however,  it  was  much  closer  to  the 
"rested  state”  voice  than  to  the  utterance  after  66  h  (V  c  = 
0.19)  which  compares  voice  at  12  h  awake  with  that  at 
78  h  awake.  Note  that  from  this  point  on,  the  term  Vc  (for 
Voice  Correlation)  will  be  used  instead  of  the  generic 
correlation  coefficient  (CC)  to  emphasize  that  change  in 
voice. 

Changes  in  the  resulting  voice  vectors  were  compared 
to  physiological  and  behavioral  measures  of  fatigue.  This 
is  discussed  in  the  next  section. 

Performance  M  odels  and  Sleepiness  M  easures 

The  hypothesis  that  voice  changes  reflect  the  speaker's 
level  of  sleepiness  and,  consequently,  his  or  her  level  of 
performance  on  alertness-dependent  tasks,  was  formu¬ 
lated.  T o  test  this  hypothesis,  three  models,  or  sets  of  data, 
with  which  voice  could  be  compared  included:  (1)  sleep 
onset  latency,  (2)  a  parametric  performance  model,  and 
(3)  a  nonparametric  performance  model. 

Sleep  onset  latency  (SOL).  H  istorical  ly,  the  M  ultiple 
Sleep  Latency  Test  (M  SLT)  has  been  the  primary  objec¬ 
tive  test  used  for  the  measurement  of  sleepiness  and  alert¬ 
ness,  respectively.  The  M  SLT  procedure  was  formalized 
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Figure  3.  Performing  voice  quantification  from  the  speech  signal.  As  is  common  to  most 
speech  recognition  software,  the  front  end  of  our  system  recognizes  individual  sounds  by 
matching  thecepstral  components  of  windowed  segments  of  the  voice  signal  to  known  sound- 
cepstrum  templates.  Werefer  to  these  cepstral  components  as  the  voice  vector. 
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Figure 4.  Block  diagram  oftheSAFTE  Model.  Under  fully  rested  conditions, a  person  has 
a  finite  maximal  capacity  to  perform. This  is  represented  by  the  reservoir  capacity  (Rc).  In 
our  depiction  of  this  mod  el,  adjustable  flow  valves  (shown  in  theupper  left  and  center  right) 
are  used  to  represent  flow  control  into  and  out  of  the  reservoir.  During  waking  activity,  the 
reservoir  is  depleted.  During  sleep  the  reservoir  is  replenished.  The  constant  rate  terms  Cl, 
C2,  and  C  3,  are  modeled  according  to  each  subject's  test  data.  Circadian  influences  affect  the 
rate  the  replenishment  process  and  has  an  overall  effect  on  performance. 


in  1977  to  measure  sleepiness  in  young  normal  subjects 
involved  in  sleep  deprivation  experiments.  The  methodol¬ 
ogy  requires  subjects  to  be  put  in  bed  during  the  wake 
period  and  told  to  try  to  fall  asleep.  Each  test  is  terminated 
after  20  min  if  the  subject  did  not  fall  asleep.  If  sleep  oc¬ 
curs,  the  subject  is  awakened  after  60  sec  of  Stage  1  sleep. 
TheSOL  is  measured  from  lights  out  to  the  first  minute  of 
stage  1  sleep.  Signif icantcorrelationsoften  found  between 
the  length  of  sleep  loss  and  sleep  latency  gives  face  valid¬ 
ity  for  using  sleep  onset  latency  as  a  biologically  based 
measure  of  sleepiness  (Arand,  Bonnet,  Hurwitz,  M  itler, 
Rosa,  &  Sangal,  2005). 

Parametric  performance  model.  Parametric  models 
are  characterized  by  having  a  fixed  structure  derived  by 
prior  knowledge  of  the  system  being  modeled.  This  prior 
knowledge  can  be  taken  from  mathematical  equations, 
empirical  relationships,  or  first  principles.  As  might  be 
assumed,  this  prior  know  I  edge  requires  a  detailed  and  pre¬ 
cise  understanding  of  the  phenomenon  being  modeled. 

One  example  of  a  well  developed  parametric  model  of 
predicted  performance  changes  with  regard  to  sleep  cy¬ 
cles,  is  the  Sleep,  Activity,  Fatigue,  and  Task  Effectiveness 
(SA  FT  E)  model  (H  ursh  et  al.,  2004).  This  overall  parametric 
sleep  model,  illustrated  schematically  in  Figure 4,  assumes 
that  each  individual  has  a  sleep-dependent  reservoir  of  ca¬ 
pacity  to  perform  cognitive  tasks.  Underfully  rested  condi¬ 
tions,  a  person  has  a  finite  maximal  capacity  to  perform,  as 
represented  in  the  figure  as  a  reservoir  value  Rc.  While  the 


individual  is  awake,  this  capacity  is  depleted.  While  asleep, 
the  reservoir  (and  hence  capacity)  is  replenished. 

In  a  parametric  model,  a  fixed  structure,  developed  from 
an  understanding  of  the  system  under  study,  imposes  differ¬ 
ent  tasks  on  different  parts  of  the  model.  As  a  result,  inter¬ 
pretation  of  the  model’s  response  to  input  parameter  changes 
can  be  based  on  the  components  of  the  actual  system.  For 
example,  the  amount  of  replenishment  that  occurs  during 
sleep  is  dependent  upon  the  depth  and  quality  of  sleep, 
which  in  turn,  depends  upon  how  sleepy  the  individual  was 
at  sleep  onset  (C  2  *[R  c  -  Rt]  in  F  igure  4)  aswell  as  the  time 
of  sleep  onset,  relative  to  the  sleeper's  circadian  phase.  Wak¬ 
ing  during  sleep  produces  sleep  fragmentation  and  causes  a 
decrease  in  replenishment  of  the  reservoir.  During  waking 
hours,  the  amount  and  type  of  activity  that  the  individual 
performs  influences  the  amount  of  drain  on  the  reservoir. 

Nonparametric  performance  model.  Nonparamet- 
ric  models  (such  as  artificial  neural  networks)  do  not  re¬ 
quire  a  priori  knowledge  of  the  system  understudy  and,  as 
such,  are  purely  data-driven.  In  other  words,  a  "black  box” 
mapping  between  the  measured  inputs  and  the  resulting 
outputs  is  determined.  W  hi  I  e  nonparametric  models  often 
perform  better  (from  a  prediction  vs.  measurement  per¬ 
spective),  their  lack  of  a  physiologically-based  structure 
makes  it  difficult  to  translate  the  relationship  between  the 
model's  internal  parameters  and  the  system  understudy. 

Nonparametric  models  can  be  developed  based  upon 
correlations  with  observed  data  and  mathematical  equa- 
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Figure  5.  C  hangein  the  voice  vector  versus  change  in  sleepiness.  Sleep  onset  latency 
(SO  L )  trends  downward  over  time  for  the  FAA  group  average  with  circadian  bumps 
at  16  and  28  h  awake  (10  p.m.  and  10  a.m.).  Voice  correlation  (Vc),  isthe  change  in  the 
voice  vector,  as  quantified  by  the  correlation  with  the  vector  atTrial  1. 


tions  which  describe  this  data.  The  resulting  model’s 
generalization  capability  relates  to  the  model's  ability  to 
predict  new  data. 

A  nonparametric  model  was  developed  using  an  equa¬ 
tion  for  passive  performance  suggested  by  Gregory  et  al. 
(2004)  as  shown  in  Equation  2. The  terms  on  the  leftside 
of  Equation  2  reflect  time  awake,  whilethoseon  the  right 
side  reflect  circadian  influences.  We  determined  the  non¬ 
parametric  model  for  each  subject  by  adjusting  the  "A" 
and  "peak"  constants  of  Equation  2,  and  visually  match¬ 
ing  the  model  to  the  SOL  versus  time  awake  data. 

Passive  performance  (alertness)  =  [(31.4  -  A  )* 
exp(-0.527*SD)  +A]*{1  +  0.33*  (2) 

cos[6.28*(time-  peak)/24] }, 

where 

SD  =  hours  awake 

peak  =  time  of  day  at  peak  performance 
time  =  time  of  day 
and 

A  is  a  curvefitting  coefficient. 

Using  both  SOL  and  Performance  models  as  bench¬ 
marks,  the  present  study  tracked  changes  in  the  voice  cor¬ 
relation  metric  (described  above)  as  the  speakers  in  the 
three  test  groups  became  fatigued.  This  is  documented  in 
the  foil  owing  section. 

TEST  RESULTS 

Voice  Change  With  Fatigue 

Voice  versus  SOL.  Figure  5  shows  the  group  aver¬ 
age  change  in  both  SOL  and  theVoice Correlation  metric 
for  the  sounds  "p"  (as  in  pea)  and  ”t"  (as  in  tea)  over  the 
34-hour  sleep  loss  testing  period  in  the  FA  A  study  group, 
it  can  be  seen  from  this  figure  that  changes  in  the  voiced 


"  p”  sound  tracks  si  eepi  ness  better  than  does  the  voi  ced  "t" 
sound.  It  can  be  seen  that  change  in  articulation  of  the  "p” 
sound  tracks  the  change  in  sleepiness  due  to  time  awake 
(the  abscissa  of  Figure  5)  and  is  less  influenced  by  circa¬ 
dian  effects  than  the  sleep  onset  latency.  Using  time  awake 
as  the  independent  variable,  the  correlation  coefficients 
(R)  between  SOL  and  time  awake  is  -0.825,  between 
Vc(p)  and  time  awake  is  -0.89,  and  between  Vc(t)  and 
time  awake  is  -0.67.  From  these  numbers  we  estimate 
(using  the  value  R2)  that  time  awake  accounts  for  68%, 
79%,  and  45%  of  the  variation  of  SOL,  Vc(p),  and  V c(t), 
respectively.  It  can  be  supposed  that  circadian  influences 
contribute  a  significant  amount  to  the  remaining  variation 
of  SOL,  but  less  so  to  voice  change. 

VoiceVersus  Performance  M  odels 

Figure 6  compares  the  FAA  study-group  average  SOL 
against  both  parametric  (SAFTE)  and  nonparametric 
models.  For  each  subject,  a  circadian  model  (twin  circa¬ 
dian  peaks)  was  determined  by  optimizing  model  output 
with  his  or  her  temperature  (measured  at  the  ear).  Com¬ 
bined  with  sleep  onset  latency  versus  time  awake  data,  a 
SAFTE  model  was  developed. 

As  shown  in  Figure  6,  over  the  course  of  34  h  of  wake¬ 
fulness,  the  nonparametric  model,  which  is  data-driven, 
matches  the  circadian  pattern  of  the  SOL  much  more 
closely  than  the  SAFTE  model. 

Figure  7  compares  the  FA  A  study  group’s  average  voice 
changes  for  the  sound  "p”  with  thetwo  models.  U  nlikethe 
SOL  example  (Figure 6),  the  parametric  SAFTE  model 
follows  the  voice-change  data  more  closely  than  the  non¬ 
parametric  model. 

Figure  8  compares  theAir  Force  placebo  group's  voiced 
"p”  sound  data  with  thetwo  models.  Both  models  appear 
to  follow  a  trend  in  a  similar  manner  to  the  voice  data, 
though  the  data-driven  model  (nonparametric)  shows  a 
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Figure  6.  Performance  models  versus  average  FAA  group  sleepiness  measurements. 
While  all  three  curves  trend  downward  with  hours  awake,  the  nonparametric  model 
(Equation  2)  shows  a  tighter  fit  to  SOL  than  the  parametric  SA  FT  E  model  (R  =  0.91 
vs.  0.78). 


somewhat  closer  match.  In  this  instance,  the  correlation 
between  theSAFTE  model  andVc(p)  is  0.53,  while  that 
between  the  nonparametric  model  and  V c(p)  is  0.81. 

Differences  Between  the  Groups 

The  right  panel  of  Figure  9  illustrates  the  voice  change 
metrics  of  the  voiced  "p"  sound  for  our  three  subject  groups. 
The  FA  A  group  averages  (filled  circles)  showed  less  change 
over  time  than  did  the  Air  Force  group  (filled  squares). 

This  difference  might  be  explained  by  the  relative  level 
of  activity  performed  by  each  of  the  two  groups.  As  il¬ 


lustrated  in  the  SAFTE  model  schematic  diagram  (Fig¬ 
ure  4),  the  rateof  loss  in  the  performance  reservoir  (Rt)  is 
proportional  to  the  level  of  activity  while  awake.  The  left 
panel  of  Figure  9  illustrates  how  changing  this  parameter 
affects  the  rate  of  performance  decline  for  a  simulated 
data  set.  The  ability  to  perform  this  type  of  analysis  is  an 
advantage  of  parametric  modeling  over  nonparametric 
modeling. 

As  discussed  earlier,  the  FAA  group  performed  signifi¬ 
cantly  less  rigorous  activity  between  testing  periods  than 
the  A  i  r  Force  group  (daily  office-type  routine  vs.  hiking). 


Figure 7.  C  hange  in  the  voice  vector  versus  the  performance  models  for  the  FAA 
group  average.  Due  to  the  limited  effect  of  circadian  rhythms  on  the  voice  data,  the 
SAFTE  model  provides  a  better  match  of  the  voice  change  pattern  than  the  nonpara¬ 
metric  model  of  Equation  2  (R  =  0.90 and  0.73,  respectively). 
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Figure  8.  Change  in  the  sound  "p"  voice  vector  versus  the  performance  models  for 
the  Air  Force  placebo  group  average.  The  large  change  in  effectiveness  between  the 
rested  state  (Trial  1  at  12  h  awake)  and  Trial  21  (at  78  h  awake)  ispredicted  by  all  three 
approaches.  The  data-driven  nonparametric  model  (Equation  2)  provides  a  better 
match  to  the  voice  change  data  than  the  SA  FT  E  model  (fi  =  0.81  vs.  0.53). 


In  order  to  test  the  hypothesis  that  this  difference  is  re¬ 
sponsible  for  the  observed  differential  voice  decrements 
over  time,  comparisons  were  made  between  these  data 
and  that  obtained  from  the  Creare  test  group.  This  group 
experienced  a  slightly  higher  level  of  activity  compared 
with  the  FAA  group.  As  illustrated  by  the  filled  triangle 
symbols  of  Figure  9  (right  panel),  the  rate  of  decline  in 
voice  data  for  the  Creare  group  also  shows  only  a  slightly 


greater  decline  in  performance  over  time  compared  with 
the  FA  A  group. 

Differences  Between  the  Sounds 

Results  show  that  voice  sensitivity  to  fatigue  depends, 
in  part,  upon  the  sound  being  uttered.  A  possible  explana¬ 
tion  for  this  can  be  based  on  the  amount  of  airflow  associ¬ 
ated  with  each  sound. 
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Figure  9.  Rate  of  voice  change  for  all  three  test  groups.  As  shown  in  the  right  panel,  the 
FAA  group  (closed  circles)  showsa  much  slower  rate  of  voice  change  than  doestheAir  Force 
placebo  group  (closed  squares).  The  average  change  in  the  voice  correlation  for  the  C  reare 
group  (closed  triangles)  matchestheFAA  group  much  more  closely  than  it  doestheAir  Force 
placebo  group.  The  C  reare  and  FAA  groups  had  a  similar  activity  pattern  over  the  testing 
period  than  did  theAir  Force  group.  For  all  groups,  the  voice  metric  is  based  upon  the  sound 
"p."  Change  in  the  SAFTE  model  performance  estimation  with  subject  activity  during  peri¬ 
ods  awakeisshown  on  theleft  panel.  Astheactivity  drain  of  the  SAFTE  model  (Cl*  activity 
in  Figure4)  increases  from  0.1  to  1.0,  the  overall  rate  of  performance  decline  increases  while 
circadian  and  time  awake  related  patterns  remain  the  same. 
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Figure  10.  The  association  between  average  flow  and  estimation  score  of  theAir  Force 
and  FAA  voice  data.  Here  voice  correlation  with  the  nonparametric  model  is  plotted  versus 
airflow.  Itcan  be  seen  that  there  is  a  significant  correlation  (P=  .01)  between  the  estimation 
capability  of  a  sound  and  the  average  airflow  required  to  utter  that  sound. 


Airflow  in  the  respiratory  tract  is  a  function  of  driv¬ 
ing  pressure  and  resistance.  Driving  pressure  comes  from 
the  lungs  and  resistance  is  produced  along  the  respiratory 
tract.  Common  locationsformodulated  resistance  include 
the  larynx  and  oral  cavity.  The  relationship  of  airflow  to 
driving  pressure  and  resistance  can  be  represented  as  fol¬ 
lows  (Equation  3): 

U  (flow)  =  P  (driving  pressure)/Z  (airway  resistance).  (3) 

Table  1  lists  average  airflow  required  to  articulate  the 
sounds  analyzed  in  this  study. 

When  one  compares  average  airflow  for  specific  con¬ 
sonants,  with  changes  in  the  vocal  characteristics  of  those 
consonants  with  fatigue,  an  association  begins  to  emerge. 
Generally,  vocal  changes  when  verbalizing  consonant 
sounds  that  require  a  high  average  airflow,  were  found  to 
be  more  sensitive  to  fatigue.  For  example,  Figure  10  illus¬ 
trates  the  association  between  voice  change,  for  both  the 
Air  Force  and  the  FAA  test  groups,  and  the  nonparamet¬ 
ric  performance  model  verses  the  average  air  flow  for  the 
monitored  sounds.  Weseeasignificant(P  =  .01)  relation¬ 
ship  between  voiced  sound  performance  estimation  ability 
and  the  average  flow  required  to  utter  that  sound. 

DISCUSSION 

Periodic  speech  recordi  ngs  were  made  duri  ng  three  sep¬ 
arate  test  protocol  conditions.  Generally,  during  (1)  2  days 
of  testing  in  a  relaxed  setting  with  no  sleep,  (2)  a  4-day 
hike  with  restricted  sleep,  and  (3)  a  day  of  testing  in  a  work 
environment.  With  the  aid  of  specially  designed  speech 
recognition  and  voice  component  analysis  software,  these 
data  were  analyzed  to  isolate  individual  sounds  (speech 
phones)  that  are  most  sensitive  to  fatigue,  and  to  quantify 
the  characteristics  of  these  sounds.  For  particular  sounds, 
changes  in  these  characteristics  were  used  to  quantify  and 
estimate  the  speaker's  level  of  fatigue. 


The  testing  conditions  reported  here  indicate  that,  for 
speech  sounds  requi  ri  ng  a  I arge  average  ai  r  f I  ow,  a  speaker’s 
voice  changes  in  synchrony  with  both  direct  measures  of 
fatigue  and  with  changes  predicted  by  the  length  of  time 
awake.  Comparison  of  voice  change  with  models  based 
upon  ti  me  awake,  such  as  the  S  A  FT  E  model ,  has  I  i  mi  tati  ons 
due  to  the  observation  that  time  awake  does  not  accurately 
quantify  the  speaker's  level  of  activity  over  that  time. 

While  many  physiological  systems  experience  a  circa¬ 
dian  influence,  they  do  not  do  so  with  the  same  sensitivity. 
This  appears  to  be  the  case  with  sleepiness  (as  measured 
by  the  sleep  onset  latency)  and  voice  change. 

Even  with  these  differences,  we  believe  that  the  associ¬ 
ation  between  voicechange,  timeawake,  and  performance 
can  be  used  as  the  basis  for  an  operational  setting  in  which 
remotely  monitored  voices  can  beanalyzed  to  estimate  the 
speaker's  level  of  fatigue.  The  results  presented  here  are  a 
first  step  in  this  development  process. 

Fatigue  has  previously  been  shown  to  affect  voice  at  a 
number  of  levels.  This  includes  anatomical  timing  between 
articulation  of  sounds  (Vollrath,  1994),  time  between 
sounds  within  a  word  (Kruger  &  Vollrath,  1996),  and  total 
word  duration  (Whitmore  &  Fisher,  1996).  As  is  the  case 
with  many  biological  systems,  circadian  effects  and  bio¬ 
logical  individually  combine  to  make  sensitivity  to  fatigue 
individual-specific  (Roth  et  al .,  1989). 


Table  1 

AverageAirflow  Necessary  to  Generate  the 
Speech  SoundsUsed  in  Our  Analysis 


Sound 

AverageAirflow 

N 

968 

Ipl 

933 

161 

525 

/g/ 

372 

/]/ 

133 

Iml 

168 

/z/ 

159 
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Our  approach  to  voice  analysis  is  unique  in  that  it  does 
not  focus  on  a  single,  discrete,  voice  parameter  but,  in¬ 
stead,  examines  changes  in  a  mathematical  representation 
of  the  entire  voice  (the  Cepstral  components).  This  repre¬ 
sentation  is  highly  individual-specific;  in  fact,  this  is,  in 
part,  how  voice  recognizers  work. 

Asa  next  step,  the  sensitivity  and  specificity  of  this 
voice-based  approach  should  be  determined  by  way  of 
testing  similar  to  that  reported  here  with  an  analysis  of 
individual  (as  opposed  to  group  average)  voice  data  from 
a  significantly  larger  population  of  subjects. 

The  resulting  fatigue  estimation  system  should  serve 
as  a  decision  aid  tool.  However,  due  to  variable  sensitivity 
and  specificity  typical  of  most  biomedical  measurements, 
the  final  determination  of  fatigue- related  effects  should 
still  be  the  task  of  a  human  evaluator. 

AUTHOR  NOTE 

The  material  presented  in  this  article  is  based  on  work  supported  by 
the  United  States  A  i  r  Force  under  Contract  F33615-03-C-6334.  Any 
opinions,  findings,  and  conclusions  or  recommendations  expressed 
in  this  article  are  those  of  the  author(s)  and  do  not  necessarily  reflect 
the  views  of  the  United  States  Air  Force.  Correspondence  concerning 
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